CUDA and OpenCL-based asynchronous PSO
نویسندگان
چکیده
1. GPU-BASED PSO PARALLELIZATION In ‘synchronous’ PSO, positions and velocities of all particles are updated in turn in each ‘generation’, after which each particle’s new fitness is evaluated. The value of the social attractor is only updated at the end of each generation, when the fitness values of all particles are known. The ‘asynchronous’ version of PSO, instead, allows the social attractors to be updated immediately after evaluating each particle’s fitness, which causes the swarm to move more promptly towards newly-found optima. In asynchronous PSO, the velocity and position update equations can be applied to any particle at any time, in no specific order. The most common GPU implementations of PSO assign one thread per particle and do not take full advantage of the GPU power in evaluating the fitness function in parallel. Parallelization only occurs on the number of particles of a swarm and ignores the dimensions of the function. In our parallel implementations: (i) we designed the thread parallelization to be as fine-grained as possible, considering that, in PSO, velocity and position update occur independently over each dimension; (ii) we implemented an ’asynchronous’ PSO which, despite updating all particles in parallel, allows each of them to update the social attractor without waiting for all other particles’ fitness values to be evaluated. A block diagram representing the GPU execution of our parallel asynchronous PSO is shown in Figure 1.
منابع مشابه
Swan: A tool for porting CUDA programs to OpenCL
The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-in...
متن کاملA Performance Comparison of CUDA and OpenCL
CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, we compare the performance of CUDA and OpenCL usin...
متن کاملProgramming CUDA and OpenCL: A Case Study Using Modern C++ Libraries
We present a comparison of several modern C++ libraries providing high-level interfaces for programming multiand many-core architectures on top of CUDA or OpenCL. The comparison focuses on the solution of ordinary differential equations and is based on odeint, a framework for the solution of systems of ordinary differential equations. Odeint is designed in a very flexible way and may be easily ...
متن کاملEvaluating Performance and Portability of OpenCL Programs
Recently, OpenCL, a new open programming standard for GPGPU programming, has become available in addition to CUDA. OpenCL can support various compute devices due to its higher abstraction programming framework. Since there is a semantic gap between OpenCL and compute devices, the OpenCL C compiler plays important roles to exploit the potential of compute devices and therefore its capability sho...
متن کاملExtending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumberso...
متن کامل